Model-based Synthesis and Transformation of Voiced Sounds
نویسندگان
چکیده
In this work a glottal model loosely based on the Ishizaka and Flanagan model is proposed, where the number of parameters is drastically reduced. First, the glottal excitation waveform is estimated, together with the vocal tract filter parameters, using inverse filtering techniques. Then the estimated waveform is used in order to identify the nonlinear glottal model, represented by a closedloop configuration of two blocks: a second order resonant filter, tuned with respect to the signal pitch, and a regressor-based functional, whose coefficients are estimated via nonlinear identification techniques. The results show that an accurate identification of real data can be achieved with less than 10 regressors of the nonlinear functional, and that an intuitive control of fundamental features, such as pitch and intensity, is allowed by acting on the physically informed parameters of the model.
منابع مشابه
Phase models in analysis/synthesis of voiced sounds
This article presents an overview of the problems involved in modeling the phase in analysis/synthesis of voiced sounds. A number of informal experiments for monaural sounds are presented, demonstrating the problems and possible improvements to these kinds of systems. Furthermore, a number of psycho-acoustic experiments are presented, for assessing the importance of phase information regarding ...
متن کاملA mixed-excitation frequency domain model for time-scale pitch-scale modification of speech
This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation,...
متن کاملOn the importance of phase information in additive analysis/synthesis of binaural sounds
This article presents a number of psycho-acoustic experiments for assessing the importance of phase information regarding the spatial qualities of synthesized sounds from voiced instruments. Binaural recorded sounds are synthesized using additive analysis/synthesis with and without phase information. The phase information is used in the synthesis to preserve the characteristics of the waveform....
متن کاملTowards an oscillator-plus-noise model for speech synthesis
The autonomous oscillator model for speech synthesis is augmented by a nonlinear predictor to regenerate the modulated noiselike signal component of speech signals. The resulting ‘oscillator-plus-noise’ model in combination with vocal tract modeling by linear prediction is able to regenerate the spectral content of stationary wide-band vowel signals with high fidelity. For adequate modeling of ...
متن کاملAn Auditory Model of Speaker Size Perception for Voiced Speech Sounds
An auditory model was developed to explain the results of behavioral experiments on perception of speaker size with voiced speech sounds. It is based on the dynamic, compressive gammachirp (dcGC) filterbank and a weighting function (SSI weight) derived from a theory of size-shape segregation in the auditory system. Voiced words with and without highfrequency emphasis (+6 dB/octave) were produce...
متن کاملAn HMM-based speech synthesiser using glottal post-filtering
Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker’s identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesiser...
متن کامل